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ABSTRACT 

Seven years on from OWL becoming a W3C recommendation, and 
two years on from the more recent OWL 2 W3C recommendation, 
OWL has still experienced only patchy uptake on the Web. Al- 
though certain OWL features (like owLsameAs) arc very popular, 
other features of OWL are largely neglected by publishers in the 
Linked Data world. This may suggest that despite the promise of 
easy implementations and the proposfd of tractable profiles sug- 
gested in owl's second version, there is still no "right" standard 
fragment for the Linked Data community. In this paper, we (1) 
analyse uptake of OWL on the Web of Data, (2) gain insights into 
the OWL fragment that is actually used/usable on the Web, where 
we arrive at the conclusion that this fragment is likely to be a sim- 
pUfled profile based on OWL RL, (3) propose and discuss such a 
new fragment, which we call OWL LD (for Linked Data). 

1. INTRODUCTION 

Under the initial impetus of the Linking Open Data project - 
and guided by the Linked Data principles [3] and associated best- 
practices - a rich vein of openly-available structured data has been 
published on the Web using Semantic Web standards. Publishing 
RDF on the Web is no longer confined to academia and hobbyists: 
the current "Web of Data" now features exports from various cor- 
porate and commercial bodies (e.g., BBC, New York Times, Free- 
base, BestBuy), online communities (e.g., Wikipedia, Geonames), 
life-science corpora (e.g., DrugBank, Linked Clinical Trials) and 
governmental bodies (e.g., data.gov, data.gov.uk, EuroStat). The 
"Linked Open Data cloud" now depicts 295 interlinked datasets, 
which together consist of an estimated 31.6 billion RDF triples.' 

Although RDF provides standard syntaxes and a common data- 
model for disseminating structured information, it offers very lit- 
tle when it comes to giving semantics to the published data. RDF 
Schema (RDFS) and OWL were developed to address this by pro- 
viding a vocabulary for describing schema data. The special vo- 
cabulary terms of RDFS and OWL - such as rdfs:subClassOf or 
owl:FunctionalProperty - have a well-defined semantics, which 
can be used to derive implicit consequences from the data. 

'http://www4.wiwiss.fu-berlin.de/lodcloud/state/ 
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In terms of publishing, parts of the RDFS and OWL standards 
have been adopted on the Web of Data. Linked Data literature rec- 
onmiends use of owLsameAs relations to denote when two URIs 
refer to the same resource [18, § 2.5.2]. Further, Linked Data guide- 
Unes recommend use of RDFS [18, § 4.4.2] for defining terms and 
interlinking vocabularies. As regards the broader OWL standard, 
current guidelines explicitly mention use of owLequivalentClass, 
owliequivalentProperty, owLInverseFunctionalProperty and 
owl:inverseOf [18, § 4.4.2]. However, other OWL features are not 
mentioned. 

In terms of standards, RDFS and OWL 1 pre-date the Linked 
Data movement and are not directly tailored towards Linked Data 
requirements. Although the informative entailment rules for sup- 
porting RDFS inferences are relatively straightforward, things like 
the infinitely many entailed axiomatic triples reduce its practical- 
ity [28]. In OWL 1 the situation is more complex: OWL 1 Full 
further extends the RDFS semantics to the extent that reasoning 
becomes undecidable. In OWL 1 DL and OWL 1 Lite, where 
the semantics are based on Description Logics, typical reasoning 
tasks remain decidable, but are of exponential or harder worst-case 
complexity. OWL 2 addresses the complexity issue by defining 
profiles [6]: fragments for which at least some reasoning tasks are 
tractable. Reasoning with inconsistent data is, however, still prob- 
lematic in any OWL fragment. Further, each profile is a syntactic 
subset of OWL DL such that RDF data must adhere to certain non- 
trivial conditions which are commonly not followed in Web ontolo- 
gies [2, 38, 7]. However, OWL RL includes a ruleset called OWL 
RL/RDF, which is applicable over arbitrary RDF data. 

Although the OWL RL profile is implementable using straight- 
forward rule-based technologies, (as we will see) the profile still 
includes many features with sparse uptake in Linked Data vocab- 
ularies. Which features are prominently used is, however, unclear. 
In order to clarify this, we survey a broad spectrum of RDF Web 
data and measure uptake of individual RDFS and OWL features 
used therein. Since datatypes also play a role for OWL reason- 
ing, we additionally look at the use of datatypes in published data. 
We further analyse to what extent OWL features are supported by 
tools that provide the technical infrastructure for building complex 
Semantic Web applications. 

Our analysis suggests that a much simpler profile of OWL might 
be better targeted towards the current needs of the Linked Data 
community. We thus propose OWL LD (for Linked Data) as a sub- 
set of the OWL RL profile, using the insights of our survey to make 
an informed decision as to which features of the RDFS and OWL 
standards should be included in the profile. 

The remainder of the paper is structured as follows: In the next 
section, we introduce some preliminaries. In Section 3, we present 
our survey of the use of RDFS and OWL features on the Web, in- 



eluding a survey of datatypes. In Section 4, we analyse the tool 
support for RDFS and OWL. Drawing upon our observations, we 
propose and define the OWL LD profile in Section 5, and discuss 
formal aspects of reasoning over the profile in Section 6. Next, in 
Section 7 we give a synopsis of related work for empirical analyses 
of RDFS and OWL data on the Web. We conclude in Section 8. 

2. BACKGROUND 

Before analysing the use of OWL in the web, we first recall some 
relevant features of RDF, RDFS, and OWL semantics and give a 
summary of the existing OWL profiles. 

2.1 RDF Graphs and Their Semantics 

Given the set of URI references U, the set of blank nodes B, 
and the set of literals L, the set of RDF constants is denoted by 
C := UuBuL. We use CURlEs to denote URls (e.g., owLsameAs), 
where the prefixes used in this paper can be looked up, e.g., at 
http://prefix.cc/. We often use Turtle syntax; e.g., we may use 
a as a shortcut for rdf:type. Finally, V denotes the set of RDF 
variables ranging over C and we prefix variables with '?'. 

An RDF triple (.v, p, o) is a triple from the set of all RDF triples 
G := UuBxUxC, where s is called subject, p predicate, and o 
object. We call a finite set of triples G c G an RDF graph. 

Semantically, RDF graphs can be interpreted in a number of 
ways based on various W3C recommendations. The simple se- 
mantics [17] considers only the graph structure of RDF, whereas 
more elaborate semantics such as RDFS entailment [17] or the 
OWL 2 Direct and RDF-Based Semantics (see below) provide spe- 
cial meanings for certain terms. 

The common basis for all such semantics is that they are speci- 
fied in terms of model theory: one defines interpretations together 
with necessary and sufficient conditions that specify when an in- 
terpretation satisfies a graph. When defining a semantics E (such 
as RDF, RDFS, etc.) one often speaks of E-interpretations and 
E-satisfaction. The set of all E-interpretations that E-satisfy a graph 
G are called the ^-models of G. Semantic entailment follows from 
this notion: a graph G ^-entails a graph G', written G |=e G', if and 
only if every E-model of G is also an E-model of G'. 

2.2 OWL and its Fragments 

OWL 2 is an ontology language that provides advanced schema 
modelling capabilities that can be used together with RDF data. 
OWL 2 supersedes the earlier specification "OWL 1" by introduc- 
ing new modelling features, additional serialisations, updated con- 
formance conditions and various corrections. When omitting the 
version number, we thus mean the current standard OWL 2. 

Every RDF graph can be considered as an OWL ontology and the 
language of all RDF documents is called OWL Full to emphasise 
that all such graphs should be viewed as ontologies. In applications, 
however, OWL ontologies are usually viewed as being composed of 
axioms, that can be more complex than single triples. For example, 
the triple ex: a owl: same As ex:b . corresponds to the OWL axiom 
Samelndividual(ex:a ex:b) whereas the axiom 

ObjectPropertyRange(skos:member 
ObjectUnionOf(skos:Concept skos:Container)) (1) 

expands to the six RDF triples 

skos:member rdfs:range :x. :x owl:unionOf :xl . 

:xl rdf:first skos:Concept. :xl rdf:rest :x2 . (2) 

:x2 rdf:first skos:Container. :x2 rdf:rest rdf:nil . 

Additional conditions need to be imposed on RDF graphs to ensure 
that they are in one-to-one correspondence to a collection of OWL 



axioms. A syntactic subset of OWL Full for which this is possible is 
OWL DL, which also imposes further restrictions that are useful for 
computing semantic conclusions from the ontology [27]. It can still 
be computationally expensive to draw conclusions from OWL DL 
ontologies. To address this, OWL further defines three syntacti- 
cally restricted sub-languages (profiles) of OWL DL called OWL 
EL, OWL RL and OWL QL [6] (see also Table 2 later for a brief 
feature comparison). OWL Full, OWL DL and the OWL profiles 
together constitute the five language fragments of OWL. The essen- 
tial features of RDF Schema (sub-classes and -properties, domain, 
range) are covered by all of these fragments, but only OWL Full 
supports arbitrary RDF documents. 

Various further sub-languages of OWL have been proposed out- 
side of the official standard. The current profiles themselves have 
been inspired by existing approaches: for OWL EL [24], DL- 
Lite [5] for OWL QL, and Description Logic Programs (DLP) [13] 
and pD* [35] for OWL RL. Generally, these approaches aimed to 
maximise the expressivity and thus approach the current standard 
quite closely (albeit, only for OWL 1 features). DLP is defined 
as a syntactic fragment of OWL. Other languages - including pD* 
- came about by extending RDFS with some additional features. 
Allemang and Hendler proposed RDFS-Plus based on an informal 
survey of practitioners and three criteria felt important for adop- 
tion: pedagogism (intuitive and easy to learn), practicality (real 
use-cases in modelling), and computational feasibility (not too hard 
to implement) [1]. This language was later extended to RDFS 3.0 
along similar principles [19]. Fisher et al. propose a similar profile 
to RDFS-Plus called L2, where the rationale for including or ex- 
cluding features is given on an ad-hoc basis [11]. A more detailed 
overview of the main features for these languages is also found in 
Table 2. 

2.3 OWL Semantics and Reasoning 

OWL ontologies can be interpreted under two different seman- 
tics that agree in important cases: the RDF-Based Semantics (RS) 
[17] and the Direct Semantics (DS) [26]. Like in RDF(S), the se- 
mantics are defined by specifying a model theory, i.e., by defining 
valid interpretations for ontologies based on semantic conditions. 
In RS, these models are based on the representation of OWL ax- 
ioms as RDF graphs and thus can be viewed as a refined form of 
RDF interpretation. In DS, models are directly defined based on 
the structure of OWL axioms in the conceptual framework of De- 
scription Logics (which in turn is based on first-order logic). Due to 
this, DS is only defined for ontologies that belong to the OWL DL 
language (or to any of its profiles) while RS can also be used on 
OWL Full. Besides this restriction, OWL language fragments are 
not tied to either semantics, leaving 9 valid combinations of syn- 
tactic fragment and formal semantics [34]. 

RS is arguably more robust since it is defined for any RDF graph 
while DS only works for ontologies in OWL DL. However, RS 
entailment (of derived facts) is undecidable so that concrete im- 
plementations can compute only a subset of the conclusions that 
the semantics specifies. In contrast, there are complete implemen- 
tations for computing entailments under DS, though with a high 
(super-exponential) worst-case complexity if all of OWL DL is to 
be covered. When further restricting to the OWL profiles, entail- 
ment checking under DS can even be done in polynomial time. For 
RS, it is not known in general if the entailment problem becomes 
simpler in these cases. It is known, however, that RS and DS yield 
the same entailments on OWL RL under certain additional condi- 
tions, leading to a partial tractability result for RS for this particular 
case [6]. Similar results could be obtained in other cases since DS 
reasoning algorithms can typically be modified to obtain correct 



(though often incomplete) RS reasoners. 

DS reasoning in all of the OWL profiles and significant parts of 
OWL DL can be implemented using rules in a forward-chaining 
manner. For OWL RL, an algorithm is suggested in the specifica- 
tion [6], while other works have covered OWL EL [24] and parts 
of OWL DL that also cover OWL QL [33]. For OWL QL, query 
rewriting is a more common reasoning technique [5, 31]. There 
are many different reasoning techniques for OWL DL under DS, 
though not all of them lead to polynomial algorithms when applied 
to the OWL profiles. Two (necessarily incomplete) reasoning meth- 
ods are known for RS: algorithms based on sets of derivation rules 
like the ones for OWL RL and an approach based on using first- 
order theorem provers [32]. 

3. SURVEY OF RDFS & OWL ADOPTION 
ON THE WEB OF DATA 

We now present an empirical survey of RDFS & OWL adoption 
on the Web of Data. Our survey is conducted over the Billion Triple 
Challenge 2011 corpus, which consists of 2.145 billion quadru- 
ples crawled from 7.411 million RDF/XML documents through 
an open crawl ran in May/June 2011 spanning 791 pay-level do- 
mains. (A pay-level domain is a direct sub-domain of a top-level 
domain (TLD) or a second-level country domain (ccSLD), e.g., 
dbpedia.org, bbc.co.uk. This gives us our notion of "domain"). 
This corpus represents a broad sample of the Web of Data. 

3.1 Measures Used 

In order to adequately characterise the uptake of various RDF(S) 
and OWL features used in this corpus, we present different mea- 
sures to quantify their prevalence and prominence. 

First, we look at the prevalence of use of different features, i.e., 
how often they are used. Here, we must take into account the di- 
versity of the data under analysis, where few domains account for a 
great many triples and many domains account for few triples, where 
certain domains tend to publish many small documents and others 
publish few large documents, and so forth [20]. We thus present 
three statistics: (1) number of axioms using the feature [Ax], (2) 
number of documents [Doc] and (3) number of domains [Dom]. 

Second, we look at the prominence of use of different features. 
We use PageRank to quantify our notion of prominence: PageRank 
calculates a variant of the Eigenvector centrality of nodes (e.g., doc- 
uments) in a graph, where taking the intuition of directed links as 
"positive votes", the resulting scores help characterise the relative 
prominence of particular nodes on the Web [29, 15]. 

In particular, we first rank documents in the corpus. To construct 
the graph, we consider RDF documents as nodes, where a directed 
edge (di , di) is extended from document d\ to d2 iff d[ hosts RDF 
data that contains (in any triple position) a URI that dereferences to 
document d2 . This notion of dereferenceable links is core to Linked 
Data principles [3]. Note also that we follow redirects when check- 
ing dereferenceability. We then apply a standard PageRank analysis 
over the resulting directed graph, using the power iteration method 
with ten iterations. For reasons of space, we refer the interested 
reader to [29] for more detail on PageRank, and [20] for more de- 
tail on the particular algorithms used in this paper. 

Given these rank scores, for the different RDF(S) and OWL fea- 
tures we then present ( 1 ) the sum of PageRank scores for documents 
in which they are used [2 Rank]; (2) the max PageRank score of 
the highest- ranked document in which it appears [max Rank]; (3) 
the max PageRank position of that document in the ordering of the 
7.411 million documents [max Pos]. 

In terms of intuition under the random surfer model of Page- 
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Table 1: Top ten ranked documents and notable ranks (position 
< 1, 000) mentioned later in Table 2 

Rank [29], given an agent starting from a random location and 
traversing documents on (our sample of) the Web of Data through 
randomly selected dereferenceable URIs, the 2 Rank value for 
a feature approximately indicates the probabihty with which that 
agent will be at a document using that feature after traversing ten 
links. In other words, the score indicates the likeUhood of an agent, 
operating over the Web of Data based on dereferenceable princi- 
ples, to encounter a given feature. 

The graph extracted from the corpus consists of 7.411 million 
nodes and 198.6 million edges. Table 1 presents the top ten ranked 
documents in our corpus, which are dominated by core meta-vo- 
cabularies, documents linked therefrom, and other popular vocab- 
ularies; we also present the ranks of other notable documents men- 
tioned in the following section.- 

3.2 Survey of RDF(S)/OWL Features 

Table 2 presents the results of the survey of RDF(S) and OWL 
usage in our corpus, where for features with non-trivial semantics, 
we present the measures mentioned in the previous section, as well 
as support for the features in the different reasoning profiles dis- 
cussed in Section 2.2. We exclude rdf:type, which appeared in 
90.3% of documents. We present the table ordered by the sum of 
PageRank measure [2 Rank]; recall that Table 1 provides a legend 
for notable documents (Pos<l,000). 

In column BF, we indicate which features have expressions that 
can be represented as a single RDF triple, i.e., which features do 
not require auxiliary blank nodes of the form _:x or the SEQ pro- 
duction in Table 1 of the OWL 2 Mapping to RDF document [30]. 
This distinction is motivated by our initial observations that such 
features are typically the most widely used in Web data. 

Figure 1 gives a visual impression of the sum of PageRank mea- 
sure for the listed features (log scale), where different shades of 
grey are used to indicate to which vocabulary a term belongs (e.g., 
distinguishing the terms new in OWL 2 from the ones already in 
OWL 1). 

Regarding prevalence, we see from Table 2 that owl: same As 
is the most widely used axiom in terms of documents (1.778 mil- 

^We ran another similar analysis with links to and from core 
RDF(S) and OWL vocabularies disabled. The results for the feature 
analysis remained similar. Mainly owl:sameAs dropped several 
positions in terms of the sum of PageRank. 
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Table 2: Survey of RDFS/OWL primitives used on tiie Web of Data and support in different tractable profiles where * denotes that 
the semantics is not fully axiomatised by the OWL RL/RDF rules or that usage of the term is restricted under OWL Direct Semantics 



lion; 24%) and domains (117; 14.8%). Surprisingly (to us), RDF 
container membership properties (rdf:_*) are also heavily used. 
Regarding prominence, we make the following observations: 

(1) The top six features are those that form the core of RDFS [28], 

(2) The RDF(S) declaration classes rdfs:Class, rdf:Property 
are used in fewer, but more prominent documents than OWL's ver- 
sions owLGlass, owl:DatatypeProperty, owl:ObjectProperty. 

(3) The top eighteen features are expressible with a single RDF 
triple. The highest ranked primitive for which this is not the case 
is owLunionOf in nineteenth position, which requires use of RDF 
collections (i.e., lists). Union classes are often specified as the do- 
main or range of a given property: the most prominent such ex- 
ample is the SKOS vocabulary (the seventh highest ranked docu- 
ment) which specifies the range of the skosimember property as 
the union of skos:Concept and skos:Contalner as in (1) above. 

(4) Of the features new to OWL 2, the most prominently used is 
owLNamedlndividual in thirty-first position. Our crawl was con- 
ducted nineteen months after OWL 2 became a W3C Recommen- 
dation (Oct. 2009); by means of a quick scan of the max Pos col- 



umn of Table 2, we note that new OWL 2 features have had lit- 
tle penetration in prominent Web vocabularies during that interim. 
Further, several OWL 2 features were not used at all in our corpus. 

(5) owLcomplementOf and owlidlfferentProm are the least 
prominently used original OWL features. 

In terms of profile support, we observe that RDFS has good 
catchment for a few of the most prominent features, but otherwise 
has poor coverage. Aside from syntactic/declaration features, from 
the top-20 features, L2 misses functional propertieS(p(,s=i2), disjoint 
classeS(i5), inverse-functional propertieS(i8) and union classeS(i9). 
RDFS-Plus omits support for disjoint(i5) and union classeS(i9). DLP 
- as defined by Volz [37, §A] - has coverage of all such features, 
but does not support inverse-functional(i8) datatype properties. pD* 
does not support disjoint(i5) or union classes(i9). 

Regarding the OWL profiles, OWL EL and OWL QL both omit 
support for important top-20 features. Neither include functional(i2) 
or inverse-functional propertieS(i8), or union classeS(i9). OWL EL 
further omits support for inverse(i4) and symmetric propertieS(20). 
OWL QL does not support the prevalent same-aS(i6) feature. Con- 
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Figure 1: The sum of PageRank for each of the listed features from Table 2 shown in logarithmic scale on the vertical axis 



versely, OWL RL has much better coverage, albeit having only par- 
tial support for union classeS(i9). 

Summing up, we acknowledge that such a survey of RDFS and 
OWL cannot give a universal or definitive indication of the most 
important modelling features for Linked Data. Also, OWL 2 terms 
might need some more time for adoption still. However, the re- 
sults offer useful insights into trends of adoption, which inform the 
design of a novel OWL profile tailored for the Web of Data. 

3.3 Survey of Datatype Use 

We now look at the use of datatypes on the Web of Data. 

Aside from plain literals, the RDF semantics defines a single 
datatype supported under RDF-entailment: rdfiXMLLiteral [17]. 
However, the RDF semantics also defines D-entailment, which pro- 
vides interpretations over a datatype map that gives a mapping from 
lexical datatype strings into a value space. The datatype map may 
also impose disjointness constraints within its value space. These 
interpretations allow for determining which lexical strings are valid 
for a datatype, which different lexical strings refer to the same value 
and which to different values, and which sets of datatype values are 
disjoint from each other. An XSD-datatype map is then defined 
which extends the set of supported datatypes into those defined for 
XML Schema (1.0), including types for boolean, numeric, tempo- 
ral, string and other forms of literals. Datatypes which are deemed 
to be ambiguously defined (viz. xsd:duration) or specific to XML 
(e.g., xsd:QName), etc. are omitted. 

The original OWL specification recommends use of a similar set 
of datatypes to that for D-entailment, where compliant reasoners 
are required to support xsd:string and xsdiinteger. Further, OWL 
allows for defining enumerated datatypes. 

With the standardisation of OWL 2 came two new datatypes, 
namely owLreal and owLrational. Also, OWL 2 added support 
for xsd:dateTimeStamp. However, XSD datatypes relating to 
date, time and Gregorian calendar values are not supported. Fur- 
ther, OWL 2 introduced mechanisms for defining new datatypes 
by restricting facets of legacy defined datatypes; however, we note 
from owLonDatatype in Table 2 that facet restrictions have only 
one or two uses in our corpus. 

Implementing the entire range of RDF, XSD and OWL datatypes 
can be costly [10], with custom code (or an external library) re- 
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Table 3: Survey of (std.) datatypes used on the Web of Data 



quired to support each one. Thus, it is interesting to see which 
datatypes are most commonly used on the Web of Data. 

In our corpus, we found 278 different datatype URIs assigned 
to literals. Of these, 158 came from the DBpedia exporter which 
models SI units, currencies, etc., as datatypes. Using analogous 
measures as before, Table 3 lists the top standard RDF(S), OWL 
and XSD datatypes as used to type Uterals in our corpus. We omit 
max-rank statistics for brevity, and omit plain literals which were 
used in 6.609 million documents (89%). D indicates the datatypes 
supported by D-entailment with the recommended XSD datatype 
map. 02 indicates the datatypes supported by OWL 2. 

We observe from the table that the top four standard datatypes 
are supported by both the traditional XSD datatype map and in 



OWL 2. However, OWL 2 does not support xsd:date(5, which is 
prominently featured in our corpus, and does not support Gregorian 
datat5'peS(io,i5,i8,2o,22) nor xsd:tiine(26) • Despite not being supported 
by any standard entailment regime, xsd:duration(i4) was used in 
28 thousand documents across four different domains. 

Conversely, various standard datatypes are not used at all in the 
data; e.g., xsd:dateTimeStamp, the "new" OWL datatypes, bi- 
nary datatypes and various normalised-string/token datatypes. 

4. AVAILABLE TOOL SUPPORT 

When asking for the practical utility of certain OWL constructs, 
it is crucial to consider the available tool support. In this sec- 
tion, we survey the availability of software that provides the neces- 
sary technical infrastructure for building complex appUcations, i.e., 
databases, reasoncrs and libraries. 

Even if no logical inferencing is required, tools that want to sup- 
port a certain OWL feature usually need to be able to read OWL 
documents that contain this feature or use a Ubrary for this task. 
Conformance with the OWL standard even requires support for the 
RDF/XML serialisation as an input format [34]. Parsing triples, 
e.g., in RDF/XML or Turtle format, into OWL axioms is not an 
easy task, since axioms can be composed of several RDF triples, 
which might be distributed all over the document [30]. In addi- 
tion, OWL axioms may require use of arbitrary-length RDF lists 
which require particular attention to parse. Moreover, many RDF 
triples are ambiguous and type declaration axioms are necessary 
to resolve this. Further OWL-specific mechanisms such as imports 
add to the difficulty of writing an OWL parser based on one for 
RDF/XML or Turtle. Consequently, there are hardly any stand- 
alone libraries for parsing OWL (as opposed to RDF): we are only 
aware of the Java-based OWL API [21]. 

For tools that cannot use the OWL API due to technical or legal 
constraints, this puts up a major barrier for using OWL. Luckily, 
OWL axioms that are represented in a single RDF triple do not 
require the detection of complex triple patterns and can easily be 
processed with the RDF libraries and parsers that are available for 
many programming languages. The question of whether a feature 
can be expressed in a single triple or not may thus already have 
significant consequences for the practical cost of supporting it. 

Databases are another important class of tools for building RDF 
applications and a sizeable amount of commercial and non-com- 
mercial systems is available today. Many of these systems evaluate 
OWL features to provide improved query answering services. Ta- 
ble 4 provides an overview of tools in that area. We restrict to 
tools that have native support at least for rdfs:subClassOf and 
rdfs:subPropertyOf reasoning (excluding, e.g., Sstore), are de- 
veloped for production use (excluding prototypes such as YARS2 
[16] and QueryPie [36]) and that are meant to be used with large 
amounts of instance data (excluding OWL EL tools such as ELK 
[22]). The table lists the most frequently implemented features ex- 
pUcitly and describes profile support in a separate column. We ad- 
ditionally mention the main inference strategy and the source of our 
information.'' 

A number of tools support the (near-)complete OWL RL profile. 
Jena with the "OWL mini" ruleset has an incomplete implementa- 
tion of OWL (1) DL features that can be viewed as an approxima- 
tion of OWL RL. PelletDb and QuOnto are reasoning layers on top 
of a database with support for OWL DL and OWL QL, respectively. 

^We note that it is difficult to verify whether the tools indeed hold 
what they claim, e.g., in practice one might find that the support is 
not as complete as advertised. Nevertheless, we take each system's 
description as an indication of available support. 



DLEJena uses Pellet to perform TBox (schema) reasoning, where 
the resulting entailments and the OWL RL/RDF rules are used to 
generate a set of ABox (instance) rules, which are then executed 
using Jena's RETE engine. 

Contrasting with these fairly powerful implementations, we find 
a number of tools that support only a few selected semantic fea- 
tures, including some that only support a fragment of RDFS. 

The reasoning algorithms that have been used are also important 
in practice. Forward chaining (materialisation) often incurs sig- 
nificant penalties for data updates, although there are approaches 
to alleviate this, e.g., AllegroGraph advertises "dynamic materi- 
alisation" as a compromise. Backward chaining, in contrast, af- 
fects query answering performance but allows for easier updates. 
In the case of OWL QL (and RDFS), backward chaining can be 
performed in a particularly effective kind of query rewriting that 
depends on the schema information only and is thus likely to scale 
to bigger data volumes. The tableau approach of PelletDb, on the 
other hand, is more demanding when used at query time but can 
support all features of OWL DL. 

Summarising, among the listed systems, three systems work with 
the Direct Semantics of OWL (PelletDb, DLEJena and QuOnto), 
whereas the other systems are rule-based and work directly with 
RDF triples, usually via forward chaining. Thus, we conclude 
that an implementation via rules and compatibility also with ffie 
RDF-Based Semantics is an important criteria for comprehensive 
tool support. Surprisingly, only two thirds of the tools support 
owl:sameAs, which is one of the most popular features according 
to our survey. A possible explanation is that owLsameAs blows up 
the size of the materialisation when using forward-chaining, so for 
an efficient support special optimisations are required, as, e.g., im- 
plemented in OWLIM or Oracle llg [23]. Although, four systems 
(nearly) support OWL RL, the complexity of a fully compliant and 
efficient implementation is still considered high [23]. 

Regarding datatypes, many triples stores use internal canonicali- 
sation of typed literals, but full datatype reasoning is only sparsely 
supported or documented; some tools such as OWLIM explicitly do 
not support datatype rules of OWL RL. Datatype support in several 
tools is, for example, surveyed by Emmons et al. [10]. 

5. DEFINING THE OWL LD PROFILE 

In this section, we build upon our previous observations to sug- 
gest a simple OWL profile that is adequate for the curent needs of 
the Web. In the previous sections, we have identified a number of 
key issues for OWL adoption on ffie Web: 

1. Adequacy: features that are widely used on the Web should 
be included. 

2. Implementability: features ffiat are more challenging to pro- 
cess and reason wiffi should be avoided. 

3. Robustness: noisy and unreliable data should not prevent ffie 
use of ontological data in reasoning. 

Comparing this to the design guidelines of RDFS-Plus [1], we 
can see that adequacy relates to "practicality" while implementabil- 
ity subsumes to "computational feasibility". We do not consider 
"pedagogism" as a design goal since we did not assess how intu- 
itive features are. In contrast, ffie work presented in Section 3 and 
4 provides us with a much better understanding for assessing im- 
plementability and adequacy. Robustness has not been considered 
as a design goal for RDFS-Plus while we find it to be of great im- 
portance for making sense of Web data. 

Each of the above requirements leads to a number of concrete as- 
pects. Adequacy has been discussed in Section 3 based on a sam- 
ple of pubUshed ontologies. Looking at Table 2, we can see ffiat 
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many of the most frequently used features are of a simple struc- 
ture. In fact, owLunionOf is the highest ranked feature that is not 
expressed by a single triple in RDF serialisations of OWL. 

Implementability was discussed in Section 4. We observed that 
parsing OWL documents in RDF-based syntaxes (RDF/XML or 
Turtle) is easier when restricted to features that can be expressed 
by single triples, and which are thus directly represented in the 
RDF data model of available tools. Moreover, inferencing is more 
difficult for some features than for others, even in rule-based ap- 
proaches used commonly for OWL RL, e.g., support for list-based 
(multi-triple) expressions that can be of arbitrary length [4]. 

Robustness requires a high tolerance against syntactic errors. 
The RDF-Based Semantics has this feature and can always be ap- 
plied, hence no special language design is needed. However, it is 
also desirable to be able to apply the Direct Semantics to a fragment 
as it yields stronger completeness guarantees for reasoning. Even 
if RDF-Based entailments are desired, the completeness of DS rea- 
soning methods can be used to obtain similar guarantees for RS [6, 
Theorem PRl]. This kind of robustness can be accomplished by re- 
ducing the use of features for which OWL DL imposes additional 
requirements, in particular cardinaUties and property chains. 

Another aspect of robustness is tolerance to inconsistencies. This 
feature is generally available in OWL profiles that are not able to 
express truly disjunctive information. Due to this, all inconsis- 
tencies are directly related to an individual or literal upon which 
conflicting requirements are imposed (including the special case 
of ill-typed literal values). Hence, it is easy to ignore (all ele- 
ments involved in) inconsistencies and to continue reasoning on the 
remaining consistent ontology to derive meaningful conclusions. 
Any OWL profile (or subset thereof) has this feature. 

From these observations, we derive that it is a reasonable design 
guideline for an OWL profile to restrict to OWL axioms that are in 
OWL RL and at the same time are expressed as single RDF triples. 
This directly addresses implementability based on the above obser- 
vations together with the fact that OWL RL is now widely imple- 
mented. Adequacy is addressed since the most important features 
identified above are both in RL and expressed in single triples. Note 
that the coverage of additional, rarely used features like refiexive 
properties is not a concern from the viewpoint of adequacy (which 
asks for coverage, not for exclusivity) and is not difficult to imple- 
ment in ffie restricted fragment either. 

Robustness for interpretation in DS (i.e., as a subset of OWL DL) 
is eased by the omission of property chains and (most) cardinali- 
ties (note that functionality remains). However, other restrictions 
of OWL DL regarding the need for declarations, the non-existence 
of inverse functional data properties, and the restrictions on blank 
nodes are stiU relevant. We suggest to develop canonical (and thus 



predictable) repair strategies for addressing these issues - specify- 
ing this is left to future work. Moreover, robustness suggests that, 
similarly to OWL RL, arbitrary RDF graphs should be allowed 
when using RS for reasoning. To reconcile these issues, we first 
define a syntactic OWL LD profile as a subset of OWL RL (which 
in turn imposes the syntactic restrictions of OWL DL) and we then 
suggest an RS based extension of this profile for reasoning with 
arbitrary OWL Full ontologies. 

Formally, we define OWL LD by restricting the OWL RL gram- 
mar [6]. Roughly speaking, we remove all definitions and mentions 
of productions listed as follows: 

Datatype entailment: 

DataRange, DatalntersectionOf, DatatypeDefinition 
Boolean connectives 6- enums. : 

*OneOf, *IntersectionOf, *UnionOf, * Complement Of 
Restriction classes: 

*ValuesFrom, *HasValue, zeroOrOne, *Cardinality 
Chains Sr keys: 

property ExpressionChaln, HasKey 

Negative property assertions: 

sourcelndividual, target*. Negative* PropertyAssertion 

We further restrict the productions for Diflerentlndividuals and 
Disjoint* to not use the list-based syntaxes. The full grammar 
can be found online [12]. All additional structural restrictions of 
OWL DL are inherited from OWL RL. Note that all RL datatypes 
are supported as well, though implementers may use our study in 
Section 3 to select most relevant datatypes to support (the OWL 
specification generally allows conforming tools to answer entail- 
ment questions with Unknown if a used feature is not supported). 

Comparing OWL LD with earUer approaches, it is interesting 
to note that it can be viewed as a natural extension of languages 
like L2, RDFS-Plus, RDFS 3.0 as discussed in Section 2 and 3. In 
particular, RDFS 3.0 is already close to OWL LD which mainly 
adds further OWL 2 constructs from OWL RL while only omitting 
owliAllDiflerent as the list-based variant of owlidifferentFrom. 
This adds to our confidence that OWL LD is a natural OWL profile 
that can be motivated from a mmiber of perspectives. 

6. REASONING IN OWL LD 

OWL LD falls into a S3mtactic subset of OWL DL and can be 
processed by tools that implement DS entailment checking. On the 
other hand, we can also restrict the OWL RL/RDF rules to obtain 
a terse set of inference rules that yields sound but possibly incom- 
plete entailment under RS; the full set is found in Table 5 at the end 
of ffie paper. These rules are applicable to any RDF graph allowing 
us to robustly draw sound conclusions from Web data. 



The OWL LD ruleset comprises of rules of the form: 

Bi A . .. A S„ -> H(Q <n<3) 

where H is called the head and Si A ... A 6„ is the body. A rule with 
an empty body (e.g., the rule cIs-thing) is simply a fact. Multiple 
atoms in rule heads (e.g., eq-ref) denote conjunctions that could also 
be expressed using multiple rules with the same body. The datatype 
rules are somewhat exceptional, however, and require custom logic 
outside of a standard rule-engine. Moreover, some rules use false 
in the head to express that an inconsistency is to be derived. An 
inconsistency-tolerant system could aheady be realised by simply 
not taking these conclusions into account for query answering. 

Unlike OWL RL/RDF which encodes arbitrary-length Usts in the 
bodies of some of its rules, the bodies of OWL LD rules comprise 
solely of a fixed set of (a maximum of three) ternary RDF atoms of 
the form T(s, p, o) where j, p, o 6 C U V. These restrictions sim- 
pUfy the use of the OWL LD rules in a variety of tools. Excluding 
datatype support, since the rules can only derive triples that are built 
from the set C of RDF constants that originally occur in the ontol- 
ogy and ruleset, the number of entailments is bounded by \Cf. This 
bound is tight, e.g., the rules entail all possible triples from the RDF 
graph owLsameAs owLsameAs a; rdfs:domain owl : Thing . 
Optimisations for rule-based systems as explored in many works 
can be appUed to implement the OWL LD inferencing efficiently. 
Systems can efficiently support datatypes by, e.g., only checking 
entailments as needed, or using canonicalisation techniques, etc. 

We are now left to describe the relationship between DS and RS 
for the OWL LD profile. 

Theorem 1. Let R contain the OWL LD entailment rules (Ta- 
ble 5) and let Oi and O2 be OWL 2 ontologies that satisfy the 
OWL LD grammar and the following properties: 

1. neither Oi nor O2 contains an IRI that is used for more than 
one type of entity ( i.e., no IRI is used both as, say, a class and 
an individual); 

2. Oi does not contain SubAnnotationPiopertyOf, Anno- 
tationPTOpertyDomain or AnnotationPTOpeTtyRange; 

3. each axiom in O2 is an assertion of the form as specified 
below, for a, a\, and 02 named individuals: 

(a) Class AssertionCC a) where C is a class, 

(b) OhjectPropertyAssertionCOP ai a2) where OP is 
an object property, 

(c) DataPropertyAssertionCDP ai a2) where DP is a 
data property, or 

(d) SamelndividuaKai a2). 

Furthermore, let RDF{0\) and RDF(02) be translations of O] and 
O2, respectively, into RDF graphs [30]; and let FO(RDF(Oi )) and 
F0(RDF(02)) be the translation of these graphs into first-order 
theories in which triples are represented using the T predicate. 
Then, 0\ entails O2 under the OWL 2 Direct Semantics [26] iff 
FO{RDF(Oi)) U R entails FO{RDF{02)) under the standard first- 
order semantics. 

The proof of the Correspondence Theorem below follows imme- 
diately from the according theorem for OWL RL [6, Theorem PRl] 
together with the fact that OWL LD is a restriction of OWL RL. 
Like in the case of OWL RL, this result applies only to checking 
the entailment of basic facts, not of OWL axioms in general. 

7. RELATED WORK 

Here we discuss related studies on the use of the RDFS and OWL 
on the Web (related OWL profiles have been covered in Section 2). 



One of the earliest comprehensive empirical studies of RDF Web 
data was presented by Ding et al. in 2005 [8]. They report about the 
prevalence of vocabulary terms in over 1.5 million RDF/XML Web 
documents, where the bulk of data was described using the Friend 
of a Friend (FOAF) and Dublin Core (DC) ontologies. The work 
focuses on characterising the structure and distributions of the raw 
data rather than issues relating to semantics or to RDFS and OWL. 

Various works look at the syntactic profiles of OWL ontologies 
on the Web [2, 38, 7]. Bechhofer and Volz identify and categorise 
OWL DL restrictions violated by a sample group of 201 OWL on- 
tologies (all of which were found to be in OWL Full); these include 
incorrect or missing typing of classes and properties, complex ob- 
ject properties (e.g., functional properties) declared to be transi- 
tive, inverse-functional datatype properties, and so forth [2]. In a 
later survey, Wang et al. study over 1,276 ontologies, where 924 
(72.4%) were identified as being in OWL FuU, although they pro- 
posed that 863 could be patched (93.4%) [38]. In a similar study, 
d'Aquin et al. found that while 81% of 22,200 RDF Web docu- 
ments surveyed fell into OWL Full, from the features used, 95% 
would fall under the expressivity of the lightweight J7IX(-D) De- 
scription Logic [7]. To summarise, these studies show that restric- 
tions laid out in the OWL standard (specifically for the OWL Lite 
and OWL DL dialects) are not well-followed by Web ontologies, 
but that such ontologies are typically relatively inexpressive. These 
works re-enforce the need for our RS-based extension of OWL LD. 

More recent papers focus on analysing owl: same As adoption 
on the Web of Data [9, 14]. Ding et al. provide a quantitative 
analysis of the owLsameAs graph extracted from the BTC-2010 
dataset (the ancestor of our corpus) [9], summarising the use of 
owl:sameAs to link between different pubUshers of Linked Data. 
In a similar vein, Halpin et al. [14] focus on the incorrect use of 
owLsameAs [14]; they employ four human judges to manually in- 
spect 500 such links sampled from Web data, where their results 
suggest that owl:sameAs is often used imprecisely, although dis- 
agreement between the judges indicates that the quality of specific 
owl:sameAs links can be subjective. Such surveys indicate that 
reasoners must proceed cautiously when operating over Web data. 

8. CONCLUSION 

We have presented a comprehensive analysis of the current use 
of OWL on the Web based on a large sample of RDF/XML docu- 
ments. We confirmed that OWL has indeed "arrived" on the Web 
of Data, albeit to varying degrees for different features. 

Following Linked Data principles, we used a PageRank algo- 
rithm to assess the importance of individual documents. Our results 
show that single-triple expressible OWL RL features are most im- 
portant on the Web. A survey of existing tools confirms that these 
simple features tend to receive better support. 

Based on these observations, we defined the OWL LD profile as 
a sub-language of OWL RL and provided a rule-based reasoning 
calculus for it. Though motivated by a new analysis of the current 
ontology documents on the Web of Data, OWL LD is well-aligned 
with the earlier proposals of RDFS-Plus and L2, indicating that it 
is a natural profile that can be motivated from various perspectives. 
We argue that this is also due to the syntactic restriction of OWL 
features to those that can be expressed using single RDF triples. 
What may appear as a superficial syntactic feature on a first glance 
actually identifies exactly the cases in which OWL expressions are 
fully aligned with the RDF data model. Wc argue that this bears 
crucial advantages regarding not only tool support but also usabil- 
ity. We therefore believe that, even if OWL as a whole might never 
arrive on the Web of Data, the OWL LD profile is a natural fit for 
ontological (aka. vocabulary) modelling on the Web of Data. 
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ID 


Body 


Head 






EQ-REF 


?s ?p ?o . 


?s owl:samGAs ?s . ?p owlisameAs ?p . ?o owl:sajneAs 


?o . 




EQ-SYM 


?x owhsameAs ?y . 


?y owlisameAs ?x . 






EQ-TRANS 


?x owhsameAs ?y . ?y owl:sameAs ?z . 


?x owLsameAs ?z . 




uali 


EQ-REP-S 


?s owLsameAs ?s' . ?s ?p ?o . 


?s' ?p ?o . 






EQ-REP-P 


?p owl:sameAs ?p' . ?s ?p ?o . 


?s ?p' ?o . 






EQ-REP-0 


?o owlrsameAs ?o' . ?s ?p ?o . 


?s ?p ?o' . 






eq-diffI 


?x owLsameAs ?y . ?x owlidifFerentFrom ?y . 


false 






PRP-AP 


(for each core annotation property ?p) 


?p a owl:AnnotationProperty . 






PRP-DOM 


?p rdfs:domain ?c . ?x ?p ?y . 


?x a ?c . 






PRP-RNG 


?p rdfs:range ?c . ?x ?p ?y . 


?y a. ?c . 






PRP-FP 


?p a owliFunctionalProperty . ?x ?p ?yi . ?x ?p Tyz ■ 


?y\ owLsameAs ?y2 . 






PRP-IFP 


?p a owliInverseF^nctionalProperty . ?xi ?p ?y . ?X2 ?p ?y ■ 


?xi owl:sameAs ?X2 . 






PRP-IRP 


?p a owl:IrreflexiveProperty . ?x ?p ?x . 


false 






PRP-SYMP 


?p a owliSymmetricProperty . ?x ?p ?y . 


?y ?p ?x . 






PRP-ASYP 


?p a owl:AsynunetricProperty . ?x ?p ?y . ?y ?p ?x . 


false 




Proper 


PRP-TRP 


?p a owliTransitiveProperty , ?x ?p ?y . ?y ?p ?z . 


?x ?p 'f'z . 




PRP-S]'C)1 


?pi rdfs:subPropertyOf ?p2 - ?x ?pi ?y . 


?x yy ■ 






prp-eqpI 


?pi owl:equivalentProperty ?p2 ■ ?x ?pi ?y . 


?x ?p2 ?y . 






PRP-EQP2 


?pi owl: equivalent Property ?p2 . ?x ?p2 ?y . 


?x ?pi ?y . 






PRP-PDW 


?pi owl:propertyDisjointWith ?p2 . ?x ?pi ?y . ?x ?p2 ?y . 


false 






PRP-INVI 


?pi owl:inverseOf ?p2 ■ ?^ ?pi ?y ■ 








PRP-INV2 


?pi owl:lnverseOf ?p2 - ?x ?P2 ^y ■ 


?y ?p\ ?x . 






CLS-THING 




owLThing a owLClass . 






CLS-NOTHING 





owLNothing a owl:Class . 






cls-nothing2 


?x a owliNothing . 


false 






CAX-SCO 


?ci rdfs:subClassOf ?C2 ■ ?x a ?ci . 


?x a ?C2 ■ 






CAX-EQCl 


?ci owl: equivalent Class ?C2 ■ ?x a ?ci . 


?x a ?C2 ■ 




Clas 


CAX-EQC2 


?ci owl: equivalent Class ?C2 ■ ?x a ?C2 ■ 


?x a ?ci . 




CAX-DW 


?c\ owl:disjointWitli ?C2 ■ ?x a ?ci , ?C2 ■ 


false 






DT-TYPE 1 


(for each supported datatype ?dt) 


?dt a rdfs:Datatype . 






dt-type2 


(for each literal ?lt in the value space of datatype ?dt) 


?lt a ?dt - 






DT-EQ 


(for all ?lt\ and ?lt2 with the same data value) 


?lti owlisameAs ?lt2 . 






DT-DIFF 


(for all ?lti and ?lt2 with different data values) 


?lti owlidifferentFrom ?lt2 ■ 






DT-NOT-TYPE 


?lt a ?dt . (where ?lt is not in the value space of ?dt) 


false 






SCM-CLS 


?c a owl: Class . 


?c rdfsisubClassOf ?c . ?c rdfsisubClassOf owliThing 






?c owliequivalentClass ?c . owliNothing rdfsisubClassOf ?c . 




SCM-SCO 


?ci rdfs:subClassOf ?C2 ■ ?C2 rdfs:subClassOf ?C3 . 


?ci rdfs:subClassOf ?C3 . 






SCM-EQCl 


?ci owl: equivalent Class ?C2 ■ 


?ci rdfs:subClassOf ?C2 ■ ?C2 rdfs: sub ClassOf ?ci . 






SCM-EQC2 


?ci rdfs:subClassOf ?C2 . ?C2 rdfs:subClassOf ?ci . 


?c\ owl: equivalent Class ?C2 ■ 




s 
-a 


SCM-OP 


?p a owl:ObjectProperty . 


?p rdfe:subPropertyOf ?p . ?p owliequivalentProperty 




hema Voce 


SCM-DP 


?p a owl:DatatypeProperty . 


?p rdfs:subPropertyOf ?p . ?p owliequivalentProperty 


?P- 


SCM-SPO 


?pi rdfs:subPropertyOf ?p2 ■ ?P2 rdfe:subPropertyOf ?ps . 


?pi rdfsisubPropertyOf ?p3 . 




scm-eqpI 


?p\ owLequivalentProperty ?p2 ■ 


?p\ rdfsisubPropertyOf ?p2 ■ ?P2 rdfsisubPropertyOf 


'Pi ■ 




scm-eqp2 


?p[ rdfs:subPropertyOf ?p2 . ?p2 rdfs:subPropertyOf ?pi . 


?pi owliequivalentProperty ?p2 ■ 






scm-domI 


?p rdfs:domain ?ci . ?ci rdfs:subClassOf ?C2 ■ 


?p rdfsidomain ?C2 . 






scm-dom2 


?P2 rdfs:domaln ?c . ?pi rdfs: sub Property Of ?p2 ■ 


?pi rdfsidomain ?c . 






scm-r\g1 


?p rdfs:range ?C| , ?C| rdfsiSubClassOf ?C2 . 


?p rdfsirange ?C2 . 








rdfs:range ?c . ?p[ rdfs:subPropertyOf ?p^ . 


?P[ rdfsirange ?c . 





Table 5: The OWL LD ruleset in "nirtle/NS style syntax where false in the head denotes inconsistency 



